Remove clean markdown generation and artifacts.#833
Conversation
Stop generating and committing clean markdown files in CI, simplify markdown serving/indexing to use source content directly, and remove .md URL suffixes from llms output and markdown alternates for the current main behavior. Made-with: Cursor
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
scripts/pagefind.ts
Outdated
| /** | ||
| * Converts clean markdown to HTML for Pagefind indexing. | ||
| * This function expects pre-cleaned markdown (no MDX syntax). | ||
| * Converts markdown to HTML for Pagefind indexing. | ||
| */ | ||
| async function markdownToHtml(markdownContent: string): Promise<string> { |
There was a problem hiding this comment.
Maybe I am missing something but pagefind didn't need to convert to html since it's a postbuild step.
There was a problem hiding this comment.
Yeh it called a api that required a html internaly, but it was replaced and this could be removed
There was a problem hiding this comment.
actually it is needed ... so the pages can look better on the search result, otherwise it will show only flat text
There was a problem hiding this comment.
@teallarson, with the changes you did for Algolia, is this still required? We should not have this dependency.
Made-with: Cursor
…pagefind Use pagefind's addCustomRecord API instead of converting markdown to HTML, removing the remark/remark-rehype/rehype-stringify dependency. Extract extractFrontmatterTitle and stripMdxSyntax as tested helpers. Made-with: Cursor
Made-with: Cursor
addCustomRecord treats content as flat text, so markdown syntax like [text](url) appeared raw in search result excerpts. Restore the remark/rehype pipeline to convert stripped MDX to HTML before indexing. Also strip JSX component tags (<Callout>, <Steps>, etc.) from MDX before conversion for cleaner search content. Made-with: Cursor
Made-with: Cursor
There was a problem hiding this comment.
I think we can remove this whole api route/file, this is used by the copy button to generate the markdown but we can actually just fetch the current url with the headers no need to call this.
| setLoading(true); | ||
| try { | ||
| const response = await fetch(`/api/markdown${pathname}.md`); | ||
| const response = await fetch(`/api/markdown${pathname}`, { |
There was a problem hiding this comment.
I think this doesn't need to call an api, we can use fetch in the same url + path and just add the headers
…mstxt workflow - Extract cleanMdxToMarkdown() into app/_lib/clean-mdx.ts with tests. Strips frontmatter, imports, exports, and JSX tags from MDX while preserving code blocks and standard markdown content. - /api/markdown route uses cleanMdxToMarkdown for non-toolkit pages so content negotiation returns clean markdown instead of raw MDX. - Copy buttons (copy-page-override, page-actions) now fetch the page URL with Accept: text/markdown instead of calling /api/markdown directly. - llmstxt workflow re-enabled on PRs and pushes to main for MDX changes. Made-with: Cursor
Cloudflare handles content negotiation and MDX-to-markdown conversion at the edge. Remove the server-side cleanup function and its tests. Made-with: Cursor
The previous SHA referenced a force-pushed commit that no longer exists, causing the llmstxt workflow to fail its diff and skip regeneration. Made-with: Cursor
Made-with: Cursor
Cloudflare handles content negotiation and markdown serving at the edge. Remove the API route and all middleware code that rewired requests to it: handleContentNegotiation, buildMarkdownPath, .md URL rewrites, AI agent detection, and related helpers. Made-with: Cursor
Made-with: Cursor
| } | ||
|
|
||
| function SearchHit({ hit }: { hit: HitRecord }) { | ||
| function getHitUrl(hit: DocSearchRecord): string { |
There was a problem hiding this comment.
Not sure why this is here now. Maybe some conflict with main?
The /api/markdown route was the only consumer of public/toolkit-markdown/. With that route deleted, nothing reads the generated files. Remove: - generate-toolkit-markdown.ts (script + test) - pagefind-toolkit-content.ts (markdown formatter + test) - toolkit-markdown build step from package.json Made-with: Cursor
… one Made-with: Cursor
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Stop generating and committing clean markdown files in CI, simplify markdown serving/indexing to use source content directly, and remove .md URL suffixes from llms output and markdown alternates for the current main behavior.
Note
Medium Risk
Removes the
app/api/markdownendpoint, markdown rewrites inmiddleware.ts, and the committedpublic/_markdownartifacts, which can break any consumers relying on.mdURLs or the old copy/export behavior. CI/build scripts are also simplified, so reviewers should verify markdown serving and LLM tooling still work end-to-end.Overview
Removes the clean-markdown pipeline: deletes the
generate-markdownGitHub Action, dropstoolkit-markdown/generate:markdown/postbuildscripts, and removes the committedpublic/_markdown/**outputs.Deletes
app/api/markdown/[[...slug]]and stripsmiddleware.ts+app/layout.tsxof.mdrouting/content-negotiation support (includingtext/markdownalternates), while updating the “Copy page” override to request markdown by fetching the currentpathnamewithAccept: text/markdown.Simplifies toolkit docs page actions by removing the custom copy button, and updates the
llmstxtworkflow to run onmainpushes for English-doc changes and to request a team review on auto-generated PRs.Written by Cursor Bugbot for commit ffcf096. This will update automatically on new commits. Configure here.